Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Hematol Oncol ; 16(1): 120, 2023 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-38102665

RESUMEN

Global proteomic data generated by advanced mass spectrometry (MS) technologies can help bridge the gap between genome/transcriptome and functions and hold great potential in elucidating unbiased functional models of pro-tumorigenic pathways. To this end, we collected the high-throughput, whole-genome MS data and conducted integrative proteomic network analyses of 687 cases across 7 cancer types including breast carcinoma (115 tumor samples; 10,438 genes), clear cell renal carcinoma (100 tumor samples; 9,910 genes), colorectal cancer (91 tumor samples; 7,362 genes), hepatocellular carcinoma (101 tumor samples; 6,478 genes), lung adenocarcinoma (104 tumor samples; 10,967 genes), stomach adenocarcinoma (80 tumor samples; 9,268 genes), and uterine corpus endometrial carcinoma UCEC (96 tumor samples; 10,768 genes). Through the protein co-expression network analysis, we identified co-expressed protein modules enriched for differentially expressed proteins in tumor as disease-associated pathways. Comparison with the respective transcriptome network models revealed proteome-specific cancer subnetworks associated with heme metabolism, DNA repair, spliceosome, oxidative phosphorylation and several oncogenic signaling pathways. Cross-cancer comparison identified highly preserved protein modules showing robust pan-cancer interactions and identified endoplasmic reticulum-associated degradation (ERAD) and N-acetyltransferase activity as the central functional axes. We further utilized these network models to predict pan-cancer protein regulators of disease-associated pathways. The top predicted pan-cancer regulators including RSL1D1, DDX21 and SMC2, were experimentally validated in lung, colon, breast cancer and fetal kidney cells. In summary, this study has developed interpretable network models of cancer proteomes, showcasing their potential in unveiling novel oncogenic regulators, elucidating underlying mechanisms, and identifying new therapeutic targets.


Asunto(s)
Adenocarcinoma , Neoplasias Renales , Neoplasias Hepáticas , Neoplasias Pulmonares , Proteínas Gestacionales , Humanos , Proteómica , Degradación Asociada con el Retículo Endoplásmico , Perfilación de la Expresión Génica/métodos , Adenocarcinoma/genética , Neoplasias Pulmonares/genética , Proteínas Gestacionales/genética , Proteínas Ribosómicas/genética , ARN Helicasas DEAD-box/genética
2.
bioRxiv ; 2023 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-38014015

RESUMEN

Cancer mutations are often assumed to alter proteins, thus promoting tumorigenesis. However, how mutations affect protein expression has rarely been systematically investigated. We conduct a comprehensive analysis of mutation impacts on mRNA- and protein-level expressions of 953 cancer cases with paired genomics and global proteomic profiling across six cancer types. Protein-level impacts are validated for 47.2% of the somatic expression quantitative trait loci (seQTLs), including mutations from likely "long-tail" driver genes. Devising a statistical pipeline for identifying somatic protein-specific QTLs (spsQTLs), we reveal several gene mutations, including NF1 and MAP2K4 truncations and TP53 missenses showing disproportional influence on protein abundance not readily explained by transcriptomics. Cross-validating with data from massively parallel assays of variant effects (MAVE), TP53 missenses associated with high tumor TP53 proteins were experimentally confirmed as functional. Our study demonstrates the importance of considering protein-level expression to validate mutation impacts and identify functional genes and mutations.

3.
bioRxiv ; 2023 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-36865220

RESUMEN

Structural features of proteins capture underlying information about protein evolution and function, which enhances the analysis of proteomic and transcriptomic data. Here we develop Structural Analysis of Gene and protein Expression Signatures (SAGES), a method that describes expression data using features calculated from sequence-based prediction methods and 3D structural models. We used SAGES, along with machine learning, to characterize tissues from healthy individuals and those with breast cancer. We analyzed gene expression data from 23 breast cancer patients and genetic mutation data from the COSMIC database as well as 17 breast tumor protein expression profiles. We identified prominent expression of intrinsically disordered regions in breast cancer proteins as well as relationships between drug perturbation signatures and breast cancer disease signatures. Our results suggest that SAGES is generally applicable to describe diverse biological phenomena including disease states and drug effects.

4.
Front Oncol ; 12: 814120, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35433463

RESUMEN

Hepatocellular carcinoma (HCC) is the fourth cause of cancer-related mortality worldwide. While many targeted therapies have been developed, the majority of HCC tumors do not harbor clinically actionable mutations. Protein-level aberrations, especially those not evident at the genomic level, present therapeutic opportunities but have rarely been systematically characterized in HCC. In this study, we performed proteogenomic analyses of 260 primary tumors from two HBV-related HCC patient cohorts with global mass-spectrometry (MS) proteomics data. Combining tumor-normal and inter-tumor analyses, we identified overexpressed targets including PDGFRB, FGFR4, ERBB2/3, CDK6 kinases and MFAP5, HMCN1, and Hsp proteins in HCC, many of which showed low frequencies of genomic and/or transcriptomic aberrations. Protein expression of FGFR4 kinase and Hsp proteins were significantly associated with response to their corresponding inhibitors. Our results provide a catalog of protein targets in HCC and demonstrate the potential of proteomics approaches in advancing precision medicine in cancer types lacking druggable mutations.

5.
Commun Biol ; 4(1): 1112, 2021 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-34552204

RESUMEN

Identifying genomic alterations of cancer proteins has guided the development of targeted therapies, but proteomic analyses are required to validate and reveal new treatment opportunities. Herein, we develop a new algorithm, OPPTI, to discover overexpressed kinase proteins across 10 cancer types using global mass spectrometry proteomics data of 1,071 cases. OPPTI outperforms existing methods by leveraging multiple co-expressed markers to identify targets overexpressed in a subset of tumors. OPPTI-identified overexpression of ERBB2 and EGFR proteins correlates with genomic amplifications, while CDK4/6, PDK1, and MET protein overexpression frequently occur without corresponding DNA- and RNA-level alterations. Analyzing CRISPR screen data, we confirm expression-driven dependencies of multiple currently-druggable and new target kinases whose expressions are validated by immunochemistry. Identified kinases are further associated with up-regulated phosphorylation levels of corresponding signaling pathways. Collectively, our results reveal protein-level aberrations-sometimes not observed by genomics-represent cancer vulnerabilities that may be targeted in precision oncology.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Neoplasias/genética , Proteínas Quinasas/genética , Proteogenómica/métodos , Regulación hacia Arriba , Adulto , Anciano , Algoritmos , Niño , Femenino , Humanos , Masculino , Persona de Mediana Edad , Neoplasias/fisiopatología , Fosforilación , Proteínas Quinasas/metabolismo , Transducción de Señal
6.
Sci Rep ; 11(1): 12107, 2021 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-34103633

RESUMEN

Effective treatments targeting disease etiology are urgently needed for Alzheimer's disease (AD). Although candidate AD genes have been identified and altering their levels may serve as therapeutic strategies, the consequence of such alterations remain largely unknown. Herein, we analyzed CRISPR knockout/RNAi knockdown screen data for over 700 cell lines and evaluated cellular dependencies of 104 AD-associated genes previously identified by genome-wide association studies (GWAS) and gene expression network studies. Multiple genes showed widespread cell dependencies across tissue lineages, suggesting their inhibition may yield off-target effects. Meanwhile, several genes including SPI1, MEF2C, GAB2, ABCC11, ATCG1 were identified as genes of interest since their genetic knockouts specifically affected high-expressing cells whose tissue lineages are relevant to cell types found in AD. Overall, analyses of genetic screen data identified AD-associated genes whose knockout or knockdown selectively affected cell lines of relevant tissue lineages, prioritizing targets for potential AD treatments.


Asunto(s)
Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/fisiopatología , Sistemas CRISPR-Cas , Predisposición Genética a la Enfermedad , Transportadoras de Casetes de Unión a ATP/genética , Actinas/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Linaje de la Célula , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Humanos , Factores de Transcripción MEF2/genética , Microglía/metabolismo , Enfermedades del Sistema Nervioso/genética , Polimorfismo de Nucleótido Simple , Proteínas Proto-Oncogénicas/genética , Interferencia de ARN , Riesgo , Transactivadores/genética
7.
Nat Commun ; 12(1): 2313, 2021 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-33875650

RESUMEN

Advances in mass-spectrometry have generated increasingly large-scale proteomics datasets containing tens of thousands of phosphorylation sites (phosphosites) that require prioritization. We develop a bioinformatics tool called HotPho and systematically discover 3D co-clustering of phosphosites and cancer mutations on protein structures. HotPho identifies 474 such hybrid clusters containing 1255 co-clustering phosphosites, including RET p.S904/Y928, the conserved HRAS/KRAS p.Y96, and IDH1 p.Y139/IDH2 p.Y179 that are adjacent to recurrent mutations on protein structures not found by linear proximity approaches. Hybrid clusters, enriched in histone and kinase domains, frequently include expression-associated mutations experimentally shown as activating and conferring genetic dependency. Approximately 300 co-clustering phosphosites are verified in patient samples of 5 cancer types or previously implicated in cancer, including CTNNB1 p.S29/Y30, EGFR p.S720, MAPK1 p.S142, and PTPN12 p.S275. In summary, systematic 3D clustering analysis highlights nearly 3,000 likely functional mutations and over 1000 cancer phosphosites for downstream investigation and evaluation of potential clinical relevance.


Asunto(s)
Biología Computacional/métodos , Mutación , Neoplasias/genética , Proteómica/métodos , Sitios de Unión/genética , Análisis por Conglomerados , Receptores ErbB/metabolismo , Humanos , Espectrometría de Masas/métodos , Neoplasias/metabolismo , Fosforilación , Proteína Tirosina Fosfatasa no Receptora Tipo 12/metabolismo , beta Catenina/metabolismo
8.
Gastroenterology ; 159(6): 2203-2220.e14, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32814112

RESUMEN

BACKGROUND AND AIMS: The pattern of genetic alterations in cancer driver genes in patients with hepatocellular carcinoma (HCC) is highly diverse, which partially explains the low efficacy of available therapies. In spite of this, the existing mouse models only recapitulate a small portion of HCC inter-tumor heterogeneity, limiting the understanding of the disease and the nomination of personalized therapies. Here, we aimed at establishing a novel collection of HCC mouse models that captured human HCC diversity. METHODS: By performing hydrodynamic tail-vein injections, we tested the impact of altering a well-established HCC oncogene (either MYC or ß-catenin) in combination with an additional alteration in one of eleven other genes frequently mutated in HCC. Of the 23 unique pairs of genetic alterations that we interrogated, 9 were able to induce HCC. The established HCC mouse models were characterized at histopathological, immune, and transcriptomic level to identify the unique features of each model. Murine HCC cell lines were generated from each tumor model, characterized transcriptionally, and used to identify specific therapies that were validated in vivo. RESULTS: Cooperation between pairs of driver genes produced HCCs with diverse histopathology, immune microenvironments, transcriptomes, and drug responses. Interestingly, MYC expression levels strongly influenced ß-catenin activity, indicating that inter-tumor heterogeneity emerges not only from specific combinations of genetic alterations but also from the acquisition of expression-dependent phenotypes. CONCLUSIONS: This novel collection of murine HCC models and corresponding cell lines establishes the role of driver genes in diverse contexts and enables mechanistic and translational studies.


Asunto(s)
Carcinoma Hepatocelular/genética , Heterogeneidad Genética , Proto-Oncogenes/genética , Animales , Carcinoma Hepatocelular/inmunología , Carcinoma Hepatocelular/patología , Línea Celular Tumoral , Biología Computacional , Modelos Animales de Enfermedad , Resistencia a Antineoplásicos/genética , Femenino , Regulación Neoplásica de la Expresión Génica/inmunología , Humanos , Neoplasias Hepáticas/inmunología , Neoplasias Hepáticas/patología , Masculino , Ratones , Ratones Transgénicos , Escape del Tumor/genética , Microambiente Tumoral/genética , Microambiente Tumoral/inmunología
9.
PLoS One ; 12(10): e0185570, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28982128

RESUMEN

Understanding the molecular machinery involved in transcriptional regulation is central to improving our knowledge of an organism's development, disease, and evolution. The building blocks of this complex molecular machinery are an organism's genomic DNA sequence and transcription factor proteins. Despite the vast amount of sequence data now available for many model organisms, predicting where transcription factors bind, often referred to as 'motif detection' is still incredibly challenging. In this study, we develop a novel bioinformatic approach to binding site prediction. We do this by extending pre-existing SVM approaches in an unbiased way to include all possible gapped k-mers, representing different combinations of complex nucleotide dependencies within binding sites. We show the advantages of this new approach when compared to existing SVM approaches, through a rigorous set of cross-validation experiments. We also demonstrate the effectiveness of our new approach by reporting on its improved performance on a set of 127 genomic regions known to regulate gene expression along the anterio-posterior axis in early Drosophila embryos.


Asunto(s)
Aprendizaje Automático , Nucleótidos/metabolismo , Factores de Transcripción/metabolismo , Sitios de Unión , Máquina de Vectores de Soporte
10.
PLoS One ; 11(12): e0167994, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27992465

RESUMEN

Exploring linkage disequilibrium (LD) patterns among the single nucleotide polymorphism (SNP) sites can improve the accuracy and cost-effectiveness of genomic association studies, whereby representative (tag) SNPs are identified to sufficiently represent the genomic diversity in populations. There has been considerable amount of effort in developing efficient algorithms to select tag SNPs from the growing large-scale data sets. Methods using the classical pairwise-LD and multi-locus LD measures have been proposed that aim to reduce the computational complexity and to increase the accuracy, respectively. The present work solves the tag SNP selection problem by efficiently balancing the computational complexity and accuracy, and improves the coverage in genomic diversity in a cost-effective manner. The employed algorithm makes use of mutual information to explore the multi-locus association between SNPs and can handle different data types and conditions. Experiments with benchmark HapMap data sets show comparable or better performance against the state-of-the-art algorithms. In particular, as a novel application, the genome-wide SNP tagging is performed in the 1000 Genomes Project data sets, and produced a well-annotated database of tagging variants that capture the common genotype diversity in 2,504 samples from 26 human populations. Compared to conventional methods, the algorithm requires as input only the genotype (or haplotype) sequences, can scale up to genome-wide analyses, and produces accurate solutions with more information-rich output, providing an improved platform for researchers towards the subsequent association studies.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Secuencia de Bases , Análisis por Conglomerados , Bases de Datos Genéticas , Epistasis Genética , Etiquetas de Secuencia Expresada , Estudios de Asociación Genética , Haplotipos , Humanos , Desequilibrio de Ligamiento , Homología de Secuencia de Ácido Nucleico
11.
EURASIP J Bioinform Syst Biol ; 2017(1): 2, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28127303

RESUMEN

BACKGROUND: Gene expression time series data are usually in the form of high-dimensional arrays. Unfortunately, the data may sometimes contain missing values: for either the expression values of some genes at some time points or the entire expression values of a single time point or some sets of consecutive time points. This significantly affects the performance of many algorithms for gene expression analysis that take as an input, the complete matrix of gene expression measurement. For instance, previous works have shown that gene regulatory interactions can be estimated from the complete matrix of gene expression measurement. Yet, till date, few algorithms have been proposed for the inference of gene regulatory network from gene expression data with missing values. RESULTS: We describe a nonlinear dynamic stochastic model for the evolution of gene expression. The model captures the structural, dynamical, and the nonlinear natures of the underlying biomolecular systems. We present point-based Gaussian approximation (PBGA) filters for joint state and parameter estimation of the system with one-step or two-step missing measurements. The PBGA filters use Gaussian approximation and various quadrature rules, such as the unscented transform (UT), the third-degree cubature rule and the central difference rule for computing the related posteriors. The proposed algorithm is evaluated with satisfying results for synthetic networks, in silico networks released as a part of the DREAM project, and the real biological network, the in vivo reverse engineering and modeling assessment (IRMA) network of yeast Saccharomyces cerevisiae. CONCLUSION: PBGA filters are proposed to elucidate the underlying gene regulatory network (GRN) from time series gene expression data that contain missing values. In our state-space model, we proposed a measurement model that incorporates the effect of the missing data points into the sequential algorithm. This approach produces a better inference of the model parameters and hence, more accurate prediction of the underlying GRN compared to when using the conventional Gaussian approximation (GA) filters ignoring the missing data points.

12.
BMC Bioinformatics ; 16: 299, 2015 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-26388177

RESUMEN

BACKGROUND: In most sequenced organisms the number of known regulatory genes (e.g., transcription factors (TFs)) vastly exceeds the number of experimentally-verified regulons that could be associated with them. At present, identification of TF regulons is mostly done through comparative genomics approaches. Such methods could miss organism-specific regulatory interactions and often require expensive and time-consuming experimental techniques to generate the underlying data. RESULTS: In this work, we present an efficient algorithm that aims to identify a given transcription factor's regulon through inference of its unknown binding sites, based on the discovery of its binding motif. The proposed approach relies on computational methods that utilize gene expression data sets and knockout fitness data sets which are available or may be straightforwardly obtained for many organisms. We computationally constructed the profiles of putative regulons for the TFs LexA, PurR and Fur in E. coli K12 and identified their binding motifs. Comparisons with an experimentally-verified database showed high recovery rates of the known regulon members, and indicated good predictions for the newly found genes with high biological significance. The proposed approach is also applicable to novel organisms for predicting unknown regulons of the transcriptional regulators. Results for the hypothetical protein D d e0289 in D. alaskensis include the discovery of a Fis-type TF binding motif. CONCLUSIONS: The proposed motif-based regulon inference approach can discover the organism-specific regulatory interactions on a single genome, which may be missed by current comparative genomics techniques due to their limitations.


Asunto(s)
Algoritmos , Proteínas de Escherichia coli/metabolismo , Escherichia coli/genética , Regulón/genética , Factores de Transcripción/metabolismo , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Sitios de Unión , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Unión Proteica , Proteínas Represoras/química , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , Serina Endopeptidasas/química , Serina Endopeptidasas/genética , Serina Endopeptidasas/metabolismo , Factores de Transcripción/química , Factores de Transcripción/genética
13.
BMC Genomics ; 14: 645, 2013 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-24059285

RESUMEN

BACKGROUND: Xor-genotype is a cost-effective alternative to the genotype sequence of an individual. Recent methods developed for haplotype inference have aimed at finding the solution based on xor-genotype data. Given the xor-genotypes of a group of unrelated individuals, it is possible to infer the haplotype pairs for each individual with the aid of a small number of regular genotypes. RESULTS: We propose a framework of maximum parsimony inference of haplotypes based on the search of a sparse dictionary, and we present a greedy method that can effectively infer the haplotype pairs given a set of xor-genotypes augmented by a small number of regular genotypes. We test the performance of the proposed approach on synthetic data sets with different number of individuals and SNPs, and compare the performances with the state-of-the-art xor-haplotyping methods PPXH and XOR-HAPLOGEN. CONCLUSIONS: Experimental results show good inference qualities for the proposed method under all circumstances, especially on large data sets. Results on a real database, CFTR, also demonstrate significantly better performance. The proposed algorithm is also capable of finding accurate solutions with missing data and/or typing errors.


Asunto(s)
Biología Computacional/métodos , Haplotipos , Modelos Genéticos , Algoritmos , Humanos , Polimorfismo de Nucleótido Simple
14.
Nucleic Acid Ther ; 23(2): 140-51, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23557118

RESUMEN

Molecular barcode arrays are widely employed in the analysis of large strain libraries, whereby probes linked to unique oligonucleotides ("antitags") are used to detect selected DNA targets ("tags") by highly specific hybridization. One of the major problems for such screen designs is thus insuring a high degree of probe-target specificity and a low level of nonspecific binding (in sum, "orthogonality") across the entire tag population ("collection"). Several approaches have been previously proposed for designing orthogonal DNA tags by-among others-focusing on their individual or pair-wise structures, such as Smith Waterman sequence similarity, the widely used nearest neighbor method, and full thermodynamic estimates of sequences. However, these methods generally involve imposing various heuristic constraints ("design rules") on possible tag/antitag (TaT) sequences in order to achieve probe-target specificity across the collection. The resulting lack of freedom in considering all putative sequences can lead to potentially suboptimal designs and to the ensuing reduction in the degree of orthogonality within the constructed TaT collections. Here, we demonstrate that a randomized-search algorithm based on simulated annealing optimization can be used in order to substantially free the design process from the limitations of sequence constraints-allowing for the elucidation of potentially more optimal DNA tag collections. The designed sets of DNA oligonucleotides are optimized for the highest degree of orthogonality as quantified by melting temperature Tm-an experimentally relevant system property, which could also be used as a theoretically meaningful thermodynamic metric for optimizing TaT binding specificity. That is, this work describes an approach to constructing tag/antitag libraries, which offer the greatest melting temperature separation between specific probe-target duplexes and other nonspecific structures. The proposed method finds, with high probability, the global solution that maximizes the difference in Tm between the specific and nonspecific tag-antitag hybridizations across a collection of given size for TaTs of specified length. An application of this approach is demonstrated using 2 different DNA probe sets.


Asunto(s)
Código de Barras del ADN Taxonómico , Sondas de ADN , ADN/genética , Desnaturalización de Ácido Nucleico , Algoritmos , ADN/química , Hibridación de Ácido Nucleico/genética , Oligonucleótidos/química , Oligonucleótidos/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...